Executive summary

The main goal of this report is to try to unravel which symptoms of COVID-19 disease are most likely to lead to death. In order to do that a dataset was downloaded and used in the following analysis. The process of gathering this data is described in this article.

The analysis consists of a short data characteristics section followed by four main parts. In the first section an attempt is made to deremine which attributes are correlated the strongest to the outcome of the patient (dead or alive). Next comes a section that includes an interactive plot for each of the attributes from the previous section. There is also a short explaination of what these attributes mean biologically and why they may be correlated to the outcome. The third section is a classification attempt. Based on the most correlated attributes there is an attempt to train the random forest classification algorithm to classify patients to a certain group (dead or alive). The final accuracy odf the created model is 97%, despite being based only on six attributes from all the 78 from the original dataset. The last section shows which variables happened to be the most important in the classification process. It seems that the lactate dehydrogenase variable had the biggest impact on the process with importance value of 74.6, while the second important variable was High-sensitivity C-reactive protein with value more than 2 times smaller (27,2). This outcome is corresponding to the article linked earlier as the same attributes were used in analysis performed by its author, which leads to a conclusion of this report being meaningful.

Loaded packages

##  [1] "kableExtra"  "caret"       "lattice"     "crosstalk"   "corrplot"   
##  [6] "corrr"       "openxlsx"    "plotly"      "ggplot2"     "formattable"
## [11] "tidyr"       "dplyr"       "knitr"       "stats"       "graphics"   
## [16] "grDevices"   "utils"       "datasets"    "methods"     "base"

Data characteristics

The provided data is organized in such a way, that for each patient there are several rows. Each one of them describes a single moment of time in which a measurement of a certain group of parameters occurred. Because of this approach there are a lot of NA values in the data both rowwise and columnwise (not every parameter was measured during a single examination).

Rows.in.the.dataset Columns.in.the.dataset Numeric.attributes First.admission Last.discharge
6120 84 78 2020-01-10 15:52:20 2020-03-04 16:21:51
Gender Number of cases
Male 224
Female 151
Attribute
Min. value
1st Qu.
Median
Mean
3rd Qu.
Max. value
NA’s
PATIENT_ID Min. : 1.0 1st Qu.: 92.0 Median :185.0 Mean :184.8 3rd Qu.:270.0 Max. :375.0
age Min. :18.00 1st Qu.:47.00 Median :62.00 Mean :59.44 3rd Qu.:71.00 Max. :95.00
gender Min. :1.000 1st Qu.:1.000 Median :1.000 Mean :1.391 3rd Qu.:2.000 Max. :2.000
Admission_time Min. :2020-01-10 15:52:20 1st Qu.:2020-02-01 00:06:16 Median :2020-02-04 15:53:12 Mean :2020-02-03 18:57:56 3rd Qu.:2020-02-09 02:06:58 Max. :2020-02-17 21:30:07
Discharge_time Min. :2020-01-23 09:09:23 1st Qu.:2020-02-13 19:06:26 Median :2020-02-17 21:50:30 Mean :2020-02-16 21:40:09 3rd Qu.:2020-02-19 13:30:26 Max. :2020-03-04 16:21:51
outcome Min. :0.0000 1st Qu.:0.0000 Median :0.0000 Mean :0.4747 3rd Qu.:1.0000 Max. :1.0000
Hypersensitive_cardiac_troponinI Min. : 1.9 1st Qu.: 4.4 Median : 20.6 Mean : 1223.2 3rd Qu.: 223.8 Max. :50000.0 NA’s :5613
hemoglobin Min. : 6.4 1st Qu.:113.0 Median :125.0 Mean :123.1 3rd Qu.:137.0 Max. :178.0 NA’s :5145
Serum_chloride Min. : 71.50 1st Qu.: 99.05 Median :102.10 Mean :103.14 3rd Qu.:105.65 Max. :140.40 NA’s :5145
Prothrombin_time Min. : 11.50 1st Qu.: 13.60 Median : 14.80 Mean : 16.68 3rd Qu.: 16.70 Max. :120.00 NA’s :5458
procalcitonin Min. : 0.020 1st Qu.: 0.040 Median : 0.100 Mean : 1.107 3rd Qu.: 0.405 Max. :57.170 NA’s :5661
eosinophils… Min. :0.000 1st Qu.:0.000 Median :0.100 Mean :0.629 3rd Qu.:0.800 Max. :8.600 NA’s :5163
Interleukin_2_receptor Min. : 61.0 1st Qu.: 459.5 Median : 676.5 Mean : 907.2 3rd Qu.:1155.5 Max. :7500.0 NA’s :5852
Alkaline_phosphatase Min. : 17.00 1st Qu.: 54.00 Median : 69.50 Mean : 82.47 3rd Qu.: 95.00 Max. :620.00 NA’s :5190
albumin Min. :13.60 1st Qu.:27.40 Median :32.20 Mean :32.01 3rd Qu.:36.60 Max. :48.60 NA’s :5186
basophil… Min. :0.00 1st Qu.:0.10 Median :0.20 Mean :0.21 3rd Qu.:0.30 Max. :1.70 NA’s :5163
Interleukin_10 Min. : 5.00 1st Qu.: 5.00 Median : 5.90 Mean : 16.07 3rd Qu.: 12.35 Max. :1000.00 NA’s :5853
Total_bilirubin Min. : 2.50 1st Qu.: 7.40 Median : 10.70 Mean : 16.70 3rd Qu.: 16.77 Max. :505.70 NA’s :5190
Platelet_count Min. : -1.0 1st Qu.:109.0 Median :178.0 Mean :184.3 3rd Qu.:248.0 Max. :558.0 NA’s :5163
monocytes… Min. : 0.300 1st Qu.: 2.800 Median : 5.700 Mean : 6.155 3rd Qu.: 8.600 Max. :53.000 NA’s :5162
antithrombin Min. : 20.00 1st Qu.: 74.00 Median : 86.00 Mean : 85.32 3rd Qu.: 97.00 Max. :136.00 NA’s :5790
Interleukin_8 Min. : 5.000 1st Qu.: 8.675 Median : 16.000 Mean : 83.088 3rd Qu.: 35.200 Max. :6795.000 NA’s :5852
indirect_bilirubin Min. : 0.100 1st Qu.: 3.800 Median : 5.400 Mean : 6.889 3rd Qu.: 8.000 Max. :145.100 NA’s :5214
Red_blood_cell_distribution_width Min. :10.60 1st Qu.:12.00 Median :12.60 Mean :13.07 3rd Qu.:13.70 Max. :27.10 NA’s :5197
neutrophils_percent Min. : 1.7 1st Qu.:65.1 Median :82.4 Mean :77.6 3rd Qu.:92.3 Max. :98.9 NA’s :5163
total_protein Min. :31.80 1st Qu.:61.00 Median :65.90 Mean :65.30 3rd Qu.:70.45 Max. :88.70 NA’s :5189
Quantification_of_Treponema_pallidum_antibodies Min. : 0.020 1st Qu.: 0.040 Median : 0.050 Mean : 0.132 3rd Qu.: 0.070 Max. :11.950 NA’s :5841
Prothrombin_activity Min. : 6.00 1st Qu.: 65.00 Median : 81.00 Mean : 78.55 3rd Qu.: 95.00 Max. :142.00 NA’s :5461
HBsAg Min. : 0.000 1st Qu.: 0.000 Median : 0.010 Mean : 8.306 3rd Qu.: 0.010 Max. :250.000 NA’s :5841
mean_corpuscular_volume Min. : 61.60 1st Qu.: 86.90 Median : 90.10 Mean : 90.39 3rd Qu.: 93.90 Max. :118.90 NA’s :5163
hematocrit Min. :14.50 1st Qu.:33.50 Median :36.60 Mean :36.55 3rd Qu.:39.90 Max. :52.30 NA’s :5163
White_blood_cell_count Min. : 0.13 1st Qu.: 4.94 Median : 7.72 Mean : 15.60 3rd Qu.: 12.72 Max. :1726.60 NA’s :4993
Tumor_necrosis_factorα Min. : 4.00 1st Qu.: 6.70 Median : 8.60 Mean : 11.58 3rd Qu.: 11.50 Max. :168.00 NA’s :5852
mean_corpuscular_hemoglobin_concentration Min. :286.0 1st Qu.:333.0 Median :343.0 Mean :342.8 3rd Qu.:350.0 Max. :514.0 NA’s :5163
fibrinogen Min. : 0.500 1st Qu.: 3.050 Median : 4.120 Mean : 4.294 3rd Qu.: 5.480 Max. :10.780 NA’s :5554
Interleukin_1β Min. : 5.00 1st Qu.: 5.00 Median : 5.00 Mean : 6.51 3rd Qu.: 5.00 Max. :88.50 NA’s :5852
Urea Min. : 0.800 1st Qu.: 4.000 Median : 5.985 Mean : 9.589 3rd Qu.:11.400 Max. :68.400 NA’s :5184
lymphocyte_count Min. : 0.000 1st Qu.: 0.460 Median : 0.800 Mean : 1.017 3rd Qu.: 1.310 Max. :52.420 NA’s :5163
PH_value Min. :5.000 1st Qu.:6.000 Median :6.500 Mean :6.484 3rd Qu.:7.294 Max. :7.565 NA’s :5736
Red_blood_cell_count Min. : 0.100 1st Qu.: 3.680 Median : 4.140 Mean : 9.288 3rd Qu.: 4.650 Max. :749.500 NA’s :4993
Eosinophil_count Min. :0.000 1st Qu.:0.000 Median :0.010 Mean :0.039 3rd Qu.:0.060 Max. :0.490 NA’s :5163
Corrected_calcium Min. :1.650 1st Qu.:2.270 Median :2.360 Mean :2.355 3rd Qu.:2.440 Max. :2.790 NA’s :5206
Serum_potassium Min. : 2.760 1st Qu.: 3.950 Median : 4.410 Mean : 4.509 3rd Qu.: 4.870 Max. :12.800 NA’s :5140
glucose Min. : 1.000 1st Qu.: 5.550 Median : 6.990 Mean : 8.889 3rd Qu.:10.260 Max. :43.010 NA’s :5345
neutrophils_count Min. : 0.06 1st Qu.: 3.09 Median : 5.85 Mean : 7.81 3rd Qu.:10.95 Max. :33.88 NA’s :5163
Direct_bilirubin Min. : 1.600 1st Qu.: 3.225 Median : 4.800 Mean : 9.887 3rd Qu.: 8.275 Max. :360.600 NA’s :5190
Mean_platelet_volume Min. : 8.50 1st Qu.:10.10 Median :10.80 Mean :10.91 3rd Qu.:11.50 Max. :15.00 NA’s :5258
ferritin Min. : 17.8 1st Qu.: 377.2 Median : 711.0 Mean : 1379.1 3rd Qu.: 1425.2 Max. :50000.0 NA’s :5837
RBC_distribution_width_SD Min. : 31.30 1st Qu.: 38.50 Median : 40.90 Mean : 42.44 3rd Qu.: 44.70 Max. :113.30 NA’s :5197
Thrombin_time Min. : 13.00 1st Qu.: 15.60 Median : 16.80 Mean : 18.17 3rd Qu.: 18.38 Max. :161.90 NA’s :5554
lymphocyte_percent Min. : 0.000 1st Qu.: 3.925 Median :11.450 Mean :15.392 3rd Qu.:24.975 Max. :60.000 NA’s :5162
HCV_antibody_quantification Min. :0.020 1st Qu.:0.040 Median :0.060 Mean :0.117 3rd Qu.:0.090 Max. :2.090 NA’s :5841
D.D_dimer Min. : 0.210 1st Qu.: 0.603 Median : 2.155 Mean : 7.943 3rd Qu.:21.000 Max. :60.000 NA’s :5490
Total_cholesterol Min. :0.100 1st Qu.:3.010 Median :3.630 Mean :3.689 3rd Qu.:4.265 Max. :7.300 NA’s :5189
aspartate_aminotransferase Min. : 6.00 1st Qu.: 19.50 Median : 27.00 Mean : 46.53 3rd Qu.: 42.00 Max. :1858.00 NA’s :5185
Uric_acid Min. : 43.0 1st Qu.: 183.2 Median : 243.7 Mean : 276.1 3rd Qu.: 333.8 Max. :1176.0 NA’s :5186
HCO3. Min. : 6.30 1st Qu.:21.00 Median :23.50 Mean :23.14 3rd Qu.:25.90 Max. :36.30 NA’s :5186
calcium Min. :1.170 1st Qu.:1.980 Median :2.080 Mean :2.078 3rd Qu.:2.190 Max. :2.620 NA’s :5141
Amino.terminal_brain_natriuretic_peptide_precursor.NT.proBNP. Min. : 5 1st Qu.: 150 Median : 585 Mean : 3669 3rd Qu.: 2625 Max. :70000 NA’s :5645
Lactate_dehydrogenase Min. : 110.0 1st Qu.: 218.0 Median : 340.0 Mean : 474.2 3rd Qu.: 601.8 Max. :1867.0 NA’s :5186
platelet_large_cell_ratio Min. :11.20 1st Qu.:25.60 Median :30.90 Mean :31.77 3rd Qu.:37.20 Max. :62.20 NA’s :5258
Interleukin_6 Min. : 1.500 1st Qu.: 4.772 Median : 19.265 Mean : 112.308 3rd Qu.: 60.167 Max. :5000.000 NA’s :5848
Fibrin_degradation_products Min. : 4.00 1st Qu.: 4.00 Median : 17.90 Mean : 61.35 3rd Qu.:150.00 Max. :190.80 NA’s :5790
monocytes_count Min. : 0.010 1st Qu.: 0.270 Median : 0.410 Mean : 0.526 3rd Qu.: 0.580 Max. :39.920 NA’s :5163
PLT_distribution_width Min. : 8.00 1st Qu.:11.10 Median :12.40 Mean :13.01 3rd Qu.:14.30 Max. :25.30 NA’s :5258
globulin Min. :10.10 1st Qu.:29.70 Median :32.70 Mean :33.24 3rd Qu.:36.50 Max. :50.60 NA’s :5190
γ.glutamyl_transpeptidase Min. : 3.00 1st Qu.: 22.00 Median : 34.00 Mean : 55.34 3rd Qu.: 58.00 Max. :732.00 NA’s :5190
International_standard_ratio Min. : 0.840 1st Qu.: 1.030 Median : 1.140 Mean : 1.313 3rd Qu.: 1.330 Max. :13.480 NA’s :5461
basophil_count… Min. :0.000 1st Qu.:0.010 Median :0.010 Mean :0.017 3rd Qu.:0.020 Max. :0.120 NA’s :5163
X2019.nCoV_nucleic_acid_detection Min. :-1 1st Qu.:-1 Median :-1 Mean :-1 3rd Qu.:-1 Max. :-1 NA’s :5619
mean_corpuscular_hemoglobin Min. :20.4 1st Qu.:29.7 Median :30.9 Mean :31.0 3rd Qu.:32.2 Max. :50.8 NA’s :5163
Activation_of_partial_thromboplastin_time Min. : 21.80 1st Qu.: 35.30 Median : 39.20 Mean : 41.52 3rd Qu.: 44.12 Max. :144.00 NA’s :5552
hsCRP Min. : 0.10 1st Qu.: 5.70 Median : 51.50 Mean : 76.24 3rd Qu.:118.50 Max. :320.00 NA’s :5383
HIV_antibody_quantification Min. :0.05 1st Qu.:0.07 Median :0.09 Mean :0.10 3rd Qu.:0.11 Max. :0.27 NA’s :5842
serum_sodium Min. :115.4 1st Qu.:137.7 Median :140.4 Mean :141.6 3rd Qu.:143.5 Max. :179.7 NA’s :5145
thrombocytocrit Min. :0.010 1st Qu.:0.150 Median :0.210 Mean :0.212 3rd Qu.:0.270 Max. :0.510 NA’s :5258
ESR Min. : 1.00 1st Qu.: 14.00 Median : 28.00 Mean : 33.69 3rd Qu.: 45.50 Max. :110.00 NA’s :5737
glutamic.pyruvic_transaminase Min. : 5.00 1st Qu.: 16.00 Median : 24.00 Mean : 38.86 3rd Qu.: 41.00 Max. :1600.00 NA’s :5189
eGFR Min. : 2.00 1st Qu.: 63.58 Median : 87.90 Mean : 81.56 3rd Qu.:103.97 Max. :224.00 NA’s :5184
creatinine Min. : 11.00 1st Qu.: 58.00 Median : 76.00 Mean : 109.93 3rd Qu.: 98.25 Max. :1497.00 NA’s :5184
Measurement_time Min. :2020-01-10 19:45:00 1st Qu.:2020-02-04 13:44:00 Median :2020-02-09 12:42:30 Mean :2020-02-08 07:00:02 3rd Qu.:2020-02-13 10:34:00 Max. :2020-02-18 17:49:00 NA’s :14
outcome_text Alive:3215 Dead :2905
Gender Male :3730 Female:2390
Normalized_time Min. : 0.00 1st Qu.: 1.25 Median : 56.85 Mean : 98.71 3rd Qu.:167.29 Max. :524.25 NA’s :14

Determining the correlation

To create a correlation matrix all measurements of every patient have to be aggregated into a single row. Hence an aggregation method must be chosen for columns containing more than one value. In the following block there are three different data frames created. Each of them utilizes a different aggregating method - mean, max and last. The “last” method means that only the most recent data is taken into consideration. Then all of these data frames are used to create three correlation data frames with the use of a package names corrr which allows to omit the phase of creating a correlation matrix and converting it into a data frame. In the following blocks and explanations I will refer to these three methods as “median”, “mean” and “last” correlations.

The library corrr allows to select concrete attribute that the analysis needs to “focus” on, which means that it will filter out all the correlations not connected to the selected attribute. In this study we want to determine which attributes can cause which outcome of the disease, so the focused attribute is “outcome”. The results are shown below in a form of bar plots. To maintain readability of the plots only correlations higher than 0.6 or lower than -0.6 are shown. The bars can be hovered above to show precise values of the correlations.

The correlation plots show that no matter what the aggregation method is the same group of attributes attributes is correlated to the outcome the strongest. There are some differences, but overall these are the same attributes repeated three times. Because of that the following analysis will focus mostly on neutrophils (percentage), fibrin degradation products (since D-dimer is its subtype it won’t be included), lactate dehydrogenase, high-sensitivity C-reactive protein, calcium, prothombin activity, albumin and lymphocyte percentage.

Analysis of the selected attributes

There are several interactive plots presented in this section. For visualization purposes the timestamp of each measurement was normalized - the difference between the first the actual measurement time and the first measurement that a given patient had. As a result the Normalized_time variable contains the number of hours that had passed from the first examination the patient had had. This approach allows to visualize and compare courses of a certain attribute among numerous patients on a single plot.

Neutrophils percentage

A healthy person should have about 55-70% of neutrophils in their body. This plot shows exacly, that deceased patients had very high percentage of neutrophils though the whole course of their treatment. If we look at the patients who lived we can see that their percentage of neutrophils was either in the specified, healthy range or decreased throughout the treatment.

High-sensitivity C-reactive protein

This plot show some extremely chaotic data concerning deceased patients. There is practically no trend or anything more to say about this data expect for the levels of hsCRP are quite high comparing to these of the patients who lived. If we select only the Alive patients we can see that in almost every case the hsCRP was decreasing over time. This is because hsCRP is a blood test that measures the level of inflammation in one’s body, it’s used for example for determining the chance of a heart disease or a stroke. High value returned by hsCRP means high inflammation, what makes sense concerning that people with high hsCRP infected with COVID-19 died.

Fibrin degradation products

Fibrin degradation products are components of the blood produced by clot degeneration. The value of FDP is high after any thrombotic event. The chaotic data on the plot might indicate that the patients with high FDP (which are only those who died later on) suffered from some kind of a blood dysfunction.

Lactate dehydrogenase

Lactate dehydrogenase is an enzyme that is present in almost every living cell. Its high levels (up to 4 times larger in deceased patients than in alive ones) can indicate an early stage of heart attacks and in general are a negative prognostic factor.

Calcium

Lower levels of calcium among deceased patients can indicate numerous things, however hypocalcemia can lead to several muscle-oriented problems, such as tetany or even disruption of conductivity in the cardiac tissue. The effect of low calcium levels has been researched and can be read about in this article.

Prothrombin activity

Prothrombin is a coagulation factor. This means that its role is to manage the clotting process. Low levels of prothrombin activity are related to fibrin degradation products. Low levels of prothrombin activity that occured among deceased patients can indicate problems with the clotting process.

Albumin

Albumin is a main protein that occurs in the human blood, being about 60% of all the proteins. Its main role is to maintain proper oncotic pressure, that prevents leakages of water containing electrolytes from the blood vessels into tissues. A healthy person should have albumin level ranging from 30 to 55 mg/ml of blood.

Lymphocyte percentage

Lymphocytes are, next to neutroils, one of five kinds of white blood cells. Low levels of lymphocytes can indicate autoimmune diseases, AIDS or other infectious diseases.

Classification

The dataset for the classification problem cannot contain NA variables if Random Forest is used as a training method. Because of that only several columns were chosen for the classification problem:

  • Lymphocyte percentage
  • Neutrophils percentage
  • High-sensitivity C-reactive protein
  • Lactate dehydrogenase
  • Albumin

These are the attributes that showed the highest correlation with the outcome, as shown in “Determining the correlation” section.

## Size of the training set:  247
## Size of the testing set:  104

Training and predicting without parameter optimaliization

  • Control parameters for the train function:
    • Method: repeatedcv (repeated cross-validation)
    • Number of folds: 2,
    • Number of complete sets of folds to compute: 5
  • The train function parameters:
    • Method: Random Forest
    • Number of trees: 10
## Random Forest 
## 
## 247 samples
##   5 predictor
##   2 classes: 'Alive', 'Dead' 
## 
## No pre-processing
## Resampling: Cross-Validated (2 fold, repeated 5 times) 
## Summary of sample sizes: 124, 123, 124, 123, 123, 124, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.9684173  0.9363620
##   3     0.9651915  0.9297891
##   5     0.9538618  0.9068729
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Alive Dead
##      Alive    54    1
##      Dead      3   46
##                                           
##                Accuracy : 0.9615          
##                  95% CI : (0.9044, 0.9894)
##     No Information Rate : 0.5481          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9226          
##                                           
##  Mcnemar's Test P-Value : 0.6171          
##                                           
##               Precision : 0.9818          
##                  Recall : 0.9474          
##                      F1 : 0.9643          
##              Prevalence : 0.5481          
##          Detection Rate : 0.5192          
##    Detection Prevalence : 0.5288          
##       Balanced Accuracy : 0.9630          
##                                           
##        'Positive' Class : Alive           
## 

Training and predicting with parameter optimalization

  • Control parameters for the train function:
    • Method: repeatedcv (repeated cross-validation)
    • Summary function: twoClassSummary
    • Number of folds: 2,
    • Number of complete sets of folds to compute: 5
  • The train function parameters:
    • Method: Random Forest
    • Metric: ROC
    • Number of trees: 30
    • Tune grid: 1:5
    • Pre-processing: center, scale
## Random Forest 
## 
## 247 samples
##   5 predictor
##   2 classes: 'Alive', 'Dead' 
## 
## Pre-processing: centered (5), scaled (5) 
## Resampling: Cross-Validated (2 fold, repeated 5 times) 
## Summary of sample sizes: 124, 123, 124, 123, 123, 124, ... 
## Resampling results across tuning parameters:
## 
##   mtry  ROC        Sens       Spec     
##   1     0.9877812  0.9630597  0.9517857
##   2     0.9883376  0.9674495  0.9732143
##   3     0.9906038  0.9645083  0.9732143
##   4     0.9877185  0.9630597  0.9571429
##   5     0.9868392  0.9615672  0.9553571
## 
## ROC was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Alive Dead
##      Alive    55    1
##      Dead      2   46
##                                         
##                Accuracy : 0.9712        
##                  95% CI : (0.918, 0.994)
##     No Information Rate : 0.5481        
##     P-Value [Acc > NIR] : <2e-16        
##                                         
##                   Kappa : 0.9419        
##                                         
##  Mcnemar's Test P-Value : 1             
##                                         
##               Precision : 0.9821        
##                  Recall : 0.9649        
##                      F1 : 0.9735        
##              Prevalence : 0.5481        
##          Detection Rate : 0.5288        
##    Detection Prevalence : 0.5385        
##       Balanced Accuracy : 0.9718        
##                                         
##        'Positive' Class : Alive         
## 

Accuracy is 1 percentage point better than before parameter tuning, Kappa value is 0,02 higher, values of the remaining measures are the same or higher than before. Because of a very high accuracy of the Random Forest method no further methods were tested.

Both high precision and recall mean that the classificator performs well, since it doesn’t return much false positives or false negatives. Not detecting ill people can be however quite problematic since it could increase the strain on the medical system even more.

Importance of the attributes of the final model

## rf variable importance
## 
##                       Overall
## Lactate_dehydrogenase  74.610
## hsCRP                  27.151
## neutrophils_percent    15.245
## lymphocyte_percent      3.140
## albumin                 2.119

The trained model shows that lactate dehydrogenase levels have the largest impact in defining whether a patient will die or not. High-sensitivity C-reactive protein is more than 2 times less important and the neutrophils percentage comes in at the third place. This outcome is confirmed by the article from which the dataset originates from.